Group N: Phase 1 - Cats vs Dogs Detector (CaDoD)¶

Team Members¶

Kangle Li, kl66@iu.edu
OPT%20photo.jpg

Genevieve Mortensen, gamorten@iu.edu
PhotoRoom-20230223_195306.png

Sean Dixit, sedixit@iu.edu
sean.jpg

Phase Leader Plan
image.png

Credit Assignment Plan
image.png

Gantt Diagram
Project%20gantt%20diagram.PNG

Project Proposal¶

Multi-Task Object Detection and Localization for Cats and Dogs (CaDoD)¶

In our project, we aim to develop a comprehensive system for Cats and Dogs detection (CaDoD) using various machine learning techniques. Our team will employ a phased approach starting from baseline models and gradually advancing to more sophisticated deep learning architectures. We will leverage both traditional machine learning algorithms liek logistic regression models using SKLearn and homegrown models, along with deep learning frameworks such as PyTorch. We plan on first creating a baseline model, and then hypertune it and compare results to see if it improves. Additionally, we will explore transfer learning using EfficientDet and SWIN transformers to enhance our model's performance. Through this project, we seek to not only accurately classify cats and dogs but also localize them within images by predicting bounding boxes. Through experimentation and iteration, we aim to deliver a robust, high-performing solution capable of accurately identifying and localizing cats and dogs in diverse real-world scenarios.

We will be using a subset of the Open Images Dataset V6. It is a large-scale dataset curated by Google designed to facilitate computer vision research and development. It contains millions of labeled images spanning a wide variety of categories, making it a valuable resource for training and evaluating machine learning models. Our subset will have 12,866 images of dogs and cats.

Why use logistic regression?
Logistic regression is easily interpretable, and it is effective for binary cassification. We can modify the loss function to combine Cross-Entropy (CXE) with Mean Squared Error (MSE) for a multitask learning approach. Vanilla logistic regression models are designed for classification. By extending the loss function to include MSE, we're adapting the model to perform classification and regression (regression being the bounding box coordinates prediction).

How to measure success?
Accuracy: For the classification part (cats vs dogs), accuracy measures the percentage of correctly classified instances.
Precision and Recall: Useful in scenarios with imbalanced datasets or when one class's false positives/negatives are more critical.
Mean Squared Error (MSE): MSE will measure the average squared difference between the estimated values and the actual value, providing insight into the precision of the bounding box predictions.
Intersection over Union (IoU): An additional metric for bounding box accuracy, IoU measures the overlap between the predicted bounding box and the actual bounding box, offering a direct indication of prediction accuracy in spatial terms.

Description of pipeline steps:
Data Collection & Preprocessing: Gather a dataset containing images of cats and dogs, annotated with class labels and bounding box coordinates. Preprocess this data for consistency, normalization, and augmentation to improve model robustness.
Feature Extraction: Transform raw image data into a suitable format for the logistic regression model, possibly using techniques like PCA for dimensionality reduction if dealing with raw pixel values.
Model Implementation: Develop the logistic regression model from scratch, extending the loss function to include both Cross-Entropy (for classification) and Mean Squared Error (for bounding box regression).
Training: Train the model on your prepared dataset, using the combined CXE + MSE loss to simultaneously learn classification and bounding box prediction.
Evaluation: Use your success metrics (Accuracy, Precision, Recall, MSE, IoU) to evaluate the model's performance on a separate test set.
Fine-tuning and Optimization: Adjust model parameters, learning rate, or preprocessing steps based on performance to improve outcomes.

Project Description¶

The purpose of this project is create an end to end pipeline in machine learning to create an object detector for cats and dogs. There are about 13,000 images of varying shapes and aspect ratios. They are all RGB images and have bounding box coordinates stored in a .csv file. In order to create a detector, we will first have to preprocess the images to be all of the same shapes, take their RGB intensity values and flatten them from a 3D array to 2D. Then we will feed this array into a linear classifier and a linear regressor to predict labels and bounding boxes.

Data Description¶

We will be using a subset of the Open Images Dataset V6. It is a large-scale dataset curated by Google designed to facilitate computer vision research and development. It contains millions of labeled images spanning a wide variety of categories, making it a valuable resource for training and evaluating machine learning models. Our subset will have 12,866 images of dogs and cats.

The image archive cadod.tar.gz is a subset Open Images V6. It contains a total of 12,966 images of dogs and cats.

Image bounding boxes are stored in the csv file cadod.csv. The following describes whats contained inside the csv.

  • ImageID: the image this box lives in.
  • Source: indicates how the box was made:
    • xclick are manually drawn boxes using the method presented in [1], were the annotators click on the four extreme points of the object. In V6 we release the actual 4 extreme points for all xclick boxes in train (13M), see below.
    • activemil are boxes produced using an enhanced version of the method [2]. These are human verified to be accurate at IoU>0.7.
  • LabelName: the MID of the object class this box belongs to.
  • Confidence: a dummy value, always 1.
  • XMin, XMax, YMin, YMax: coordinates of the box, in normalized image coordinates. XMin is in [0,1], where 0 is the leftmost pixel, and 1 is the rightmost pixel in the image. Y coordinates go from the top pixel (0) to the bottom pixel (1).
  • XClick1X, XClick2X, XClick3X, XClick4X, XClick1Y, XClick2Y, XClick3Y, XClick4Y: normalized image coordinates (as XMin, etc.) of the four extreme points of the object that produced the box using [1] in the case of xclick boxes. Dummy values of -1 in the case of activemil boxes.

The attributes have the following definitions:

  • IsOccluded: Indicates that the object is occluded by another object in the image.
  • IsTruncated: Indicates that the object extends beyond the boundary of the image.
  • IsGroupOf: Indicates that the box spans a group of objects (e.g., a bed of flowers or a crowd of people). We asked annotators to use this tag for cases with more than 5 instances which are heavily occluding each other and are physically touching.
  • IsDepiction: Indicates that the object is a depiction (e.g., a cartoon or drawing of the object, not a real physical instance).
  • IsInside: Indicates a picture taken from the inside of the object (e.g., a car interior or inside of a building). For each of them, value 1 indicates present, 0 not present, and -1 unknown.
In [ ]:
from collections import Counter
import glob
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
from PIL import Image
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model import SGDClassifier, SGDRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score
from sklearn.model_selection import train_test_split
import tarfile
from tqdm.notebook import tqdm
import warnings

Import Data¶

Unarchive data¶

In [ ]:
def extract_tar(file, path):
    """
    function to extract tar.gz files to specified location
    
    Args:
        file (str): path where the file is located
        path (str): path where you want to extract
    """
    with tarfile.open(file) as tar:
        files_extracted = 0
        for member in tqdm(tar.getmembers()):
            if os.path.isfile(path + member.name[1:]):
                continue
            else:
                tar.extract(member, path)
                files_extracted += 1
        tar.close()
        if files_extracted < 3:
            print('Files already exist')
In [ ]:
path = 'images/'

extract_tar('cadod.tar.gz', path)
HBox(children=(FloatProgress(value=0.0, max=25936.0), HTML(value='')))
Files already exist
In [4]:
!ls -l 
total 14336
-rwxr-xr-x 1 root root 4634116 Apr  9  2021 CaDoD_Phase_1_baseline.ipynb
-rwxr-xr-x 1 root root 4633634 Nov 12 22:35 CaDoD_Phase_1_baseline_SKLearn_homegrown.ipynb
-rwxr-xr-x 1 root root 3300747 May  8  2021 CaDoD_Phase_2_PyTorch.ipynb
drwxr-xr-x 4 root root     128 Nov 12 22:24 Phase_1_Digit_detector_MLP
drwxr-xr-x 7 root root     224 Aug  5 23:14 Phase_2_cats_dogs_detector_Efficient_Det
drwxr-xr-x 3 root root      96 Aug 15 04:10 Phase_3_Cats_and_Dogs_SSD_3_levels_of_detection

Load bounding box meta data¶

In [2]:
import pandas as pd
df = pd.read_csv('cadod.csv')

FileNotFoundErrorTraceback (most recent call last)
<ipython-input-2-149864d7a26e> in <module>
      1 import pandas as pd
----> 2 df = pd.read_csv('cadod.csv')

/usr/local/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

/usr/local/lib/python3.7/site-packages/pandas/io/parsers/readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    584     kwds.update(kwds_defaults)
    585 
--> 586     return _read(filepath_or_buffer, kwds)
    587 
    588 

/usr/local/lib/python3.7/site-packages/pandas/io/parsers/readers.py in _read(filepath_or_buffer, kwds)
    480 
    481     # Create the parser.
--> 482     parser = TextFileReader(filepath_or_buffer, **kwds)
    483 
    484     if chunksize or iterator:

/usr/local/lib/python3.7/site-packages/pandas/io/parsers/readers.py in __init__(self, f, engine, **kwds)
    809             self.options["has_index_names"] = kwds["has_index_names"]
    810 
--> 811         self._engine = self._make_engine(self.engine)
    812 
    813     def close(self):

/usr/local/lib/python3.7/site-packages/pandas/io/parsers/readers.py in _make_engine(self, engine)
   1038             )
   1039         # error: Too many arguments for "ParserBase"
-> 1040         return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
   1041 
   1042     def _failover_to_python(self):

/usr/local/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py in __init__(self, src, **kwds)
     49 
     50         # open handles
---> 51         self._open_handles(src, kwds)
     52         assert self.handles is not None
     53 

/usr/local/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py in _open_handles(self, src, kwds)
    227             memory_map=kwds.get("memory_map", False),
    228             storage_options=kwds.get("storage_options", None),
--> 229             errors=kwds.get("encoding_errors", "strict"),
    230         )
    231 

/usr/local/lib/python3.7/site-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    704                 encoding=ioargs.encoding,
    705                 errors=errors,
--> 706                 newline="",
    707             )
    708         else:

FileNotFoundError: [Errno 2] No such file or directory: 'cadod.csv'
In [ ]:
df.head()
Out[ ]:
ImageID Source LabelName Confidence XMin XMax YMin YMax IsOccluded IsTruncated ... IsInside XClick1X XClick2X XClick3X XClick4X XClick1Y XClick2Y XClick3Y XClick4Y area
0 0000b9fcba019d36 xclick dog 1 0.165000 0.903750 0.268333 0.998333 1 1 ... 0 0.636250 0.903750 0.748750 0.165000 0.268333 0.506667 0.998333 0.661667 0.539288
1 0000cb13febe0138 xclick dog 1 0.000000 0.651875 0.000000 0.999062 1 1 ... 0 0.312500 0.000000 0.317500 0.651875 0.000000 0.410882 0.999062 0.999062 0.651264
2 0005a9520eb22c19 xclick dog 1 0.094167 0.611667 0.055626 0.998736 1 1 ... 0 0.487500 0.611667 0.243333 0.094167 0.055626 0.226296 0.998736 0.305942 0.488059
3 0006303f02219b07 xclick dog 1 0.000000 0.999219 0.000000 0.998824 1 1 ... 0 0.508594 0.999219 0.000000 0.478906 0.000000 0.375294 0.720000 0.998824 0.998044
4 00064d23bf997652 xclick dog 1 0.240938 0.906183 0.000000 0.694286 0 0 ... 0 0.678038 0.906183 0.240938 0.522388 0.000000 0.370000 0.424286 0.694286 0.461870

5 rows × 22 columns

Exploratory Data Analysis¶

Statistics¶

In [ ]:
print(f"There are a total of {len(glob.glob1(path, '*.jpg'))} images")
There are a total of 12966 images
In [ ]:
print(f"The total size is {os.path.getsize(path)/1000} MB")
The total size is 844.512 MB
In [ ]:
df.shape
Out[ ]:
(12966, 22)

Replace LabelName with human readable labels

In [ ]:
df.LabelName.replace({'/m/01yrx':'cat', '/m/0bt9lr':'dog'}, inplace=True)
In [ ]:
df.LabelName.value_counts()
Out[ ]:
dog    6855
cat    6111
Name: LabelName, dtype: int64
In [ ]:
df.LabelName.value_counts().plot(kind='bar')
plt.title('Image Class Count')
plt.show()
In [ ]:
df.describe()
Out[ ]:
Confidence XMin XMax YMin YMax IsOccluded IsTruncated IsGroupOf IsDepiction IsInside XClick1X XClick2X XClick3X XClick4X XClick1Y XClick2Y XClick3Y XClick4Y area
count 12966.0 12966.000000 12966.000000 12966.000000 12966.000000 12966.000000 12966.000000 12966.000000 12966.000000 12966.000000 12966.000000 12966.000000 12966.000000 12966.000000 12966.000000 12966.000000 12966.000000 12966.000000 12966.000000
mean 1.0 0.099437 0.901750 0.088877 0.945022 0.464754 0.738470 0.013651 0.045427 0.001157 0.390356 0.424582 0.494143 0.506689 0.275434 0.447448 0.641749 0.582910 0.688754
std 0.0 0.113023 0.111468 0.097345 0.081500 0.499239 0.440011 0.118019 0.209354 0.040229 0.358313 0.441751 0.405033 0.462281 0.415511 0.401580 0.448054 0.403454 0.179648
min 1.0 0.000000 0.408125 0.000000 0.451389 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 0.400178
25% 1.0 0.000000 0.830625 0.000000 0.910000 0.000000 0.000000 0.000000 0.000000 0.000000 0.221293 0.096875 0.285071 0.130000 0.024323 0.218333 0.405817 0.400000 0.532997
50% 1.0 0.061250 0.941682 0.059695 0.996875 0.000000 1.000000 0.000000 0.000000 0.000000 0.435625 0.415625 0.531919 0.623437 0.146319 0.480839 0.825000 0.646667 0.676201
75% 1.0 0.167500 0.998889 0.144853 0.999062 1.000000 1.000000 0.000000 0.000000 0.000000 0.609995 0.820000 0.787500 0.917529 0.561323 0.729069 0.998042 0.882500 0.835382
max 1.0 0.592500 1.000000 0.587088 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 0.999375 0.999375 1.000000 0.999375 0.999375 0.999375 1.000000 0.999375 1.000000

Sample of Images¶

In [ ]:
# plot random 6 images
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=False, sharey=False,figsize=(15,10))
ax = ax.flatten()

for i,j in enumerate(np.random.choice(df.shape[0], size=6, replace=False)):
    img = mpimg.imread(path + df.ImageID.values[j] + '.jpg')
    h, w = img.shape[:2]
    coords = df.iloc[j,4:8]
    ax[i].imshow(img)
    ax[i].set_title(df.LabelName[j])
    ax[i].add_patch(plt.Rectangle((coords[0]*w, coords[2]*h), 
                                  coords[1]*w-coords[0]*w, coords[3]*h-coords[2]*h, 
                                  edgecolor='red', facecolor='none'))

plt.tight_layout()
plt.show()

Image shapes and sizes¶

Go through all images and record the shape of the image in pixels and the memory size

In [ ]:
img_shape = []
img_size = np.zeros((df.shape[0], 1))

for i,f in enumerate(tqdm(glob.glob1(path, '*.jpg'))):
    file = path+'/'+f
    img = Image.open(file)
    img_shape.append(f"{img.size[0]}x{img.size[1]}")
    img_size[i] += os.path.getsize(file)
HBox(children=(FloatProgress(value=0.0, max=12966.0), HTML(value='')))

Count all the different image shapes

In [ ]:
img_shape_count = Counter(img_shape)
In [ ]:
# create a dataframe for image shapes
img_df = pd.DataFrame(set(img_shape_count.items()), columns=['img_shape','img_count'])
In [ ]:
img_df.shape
Out[ ]:
(594, 2)

There are a ton of different image shapes. Let's narrow this down by getting a sum of any image shape that has a cout less than 100 and put that in a category called other

In [ ]:
img_df = img_df.append({'img_shape': 'other','img_count': img_df[img_df.img_count < 100].img_count.sum()}, 
                       ignore_index=True)

Drop all image shapes

In [ ]:
img_df = img_df[img_df.img_count >= 100]

Check if the count sum matches the number of images

In [ ]:
img_df.img_count.sum() == df.shape[0]
Out[ ]:
True

Plot

Plot aspect ratio¶

In [ ]:
img_df.sort_values('img_count', inplace=True)
img_df.plot(x='img_shape', y='img_count', kind='barh', figsize=(8,8), legend=False)
plt.title('Image Shape Counts')
plt.show()
In [ ]:
# convert to megabytes
img_size = img_size / 1000
In [ ]:
fig, ax = plt.subplots(1, 2, figsize=(15,5))
fig.suptitle('Image Size Distribution')
ax[0].hist(img_size, bins=50)
ax[0].set_title('Histogram')
ax[0].set_xlabel('Image Size (MB)')
ax[1].boxplot(img_size, vert=False, widths=0.5)
ax[1].set_title('Boxplot')
ax[1].set_xlabel('Image Size (MB)')
ax[1].set_ylabel('Images')
plt.show()

Preprocess¶

Rescale the images¶

In [ ]:
!mkdir -p images/resized
In [ ]:
%%time
# resize image and save, convert to numpy

img_arr = np.zeros((df.shape[0],128*128*3)) # initialize np.array

for i, f in enumerate(tqdm(df.ImageID)):
    img = Image.open(path+f+'.jpg')
    img_resized = img.resize((128,128))
    img_resized.save("images/resized/"+f+'.jpg', "JPEG", optimize=True)
    img_arr[i] = np.asarray(img_resized, dtype=np.uint8).flatten()
HBox(children=(FloatProgress(value=0.0, max=12966.0), HTML(value='')))
CPU times: user 1min 51s, sys: 10.3 s, total: 2min 1s
Wall time: 3min 24s

Plot the resized and filtered images

In [ ]:
# plot random 6 images
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=False, sharey=False,figsize=(15,10))
ax = ax.flatten()

for i,j in enumerate(np.random.choice(df.shape[0], size=6, replace=False)):
    img = mpimg.imread(path+'/resized/'+df.ImageID.values[j]+'.jpg')
    h, w = img.shape[:2]
    coords = df.iloc[j,4:8]
    ax[i].imshow(img)
    ax[i].set_title(df.iloc[j,2])
    ax[i].add_patch(plt.Rectangle((coords[0]*w, coords[2]*h), 
                                  coords[1]*w-coords[0]*w, coords[3]*h-coords[2]*h, 
                                  edgecolor='red', facecolor='none'))

plt.tight_layout()
plt.show()
In [ ]:
# encode labels
df['Label'] = (df.LabelName == 'dog').astype(np.uint8)

Checkpoint and Save data¶

In [ ]:
mkdir -p data
In [ ]:
np.save('data/img.npy', img_arr.astype(np.uint8))
np.save('data/y_label.npy', df.Label.values)
np.save('data/y_bbox.npy', df[['XMin', 'YMin', 'XMax', 'YMax']].values.astype(np.float32))

Baseline in SKLearn¶

Load data¶

In [ ]:
X = np.load('data/img.npy', allow_pickle=True)
y_label = np.load('data/y_label.npy', allow_pickle=True)
y_bbox = np.load('data/y_bbox.npy', allow_pickle=True)
In [ ]:
idx_to_label = {1:'dog', 0:'cat'} # encoder

Double check that it loaded correctly

In [ ]:
# plot random 6 images
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=False, sharey=False,figsize=(15,10))
ax = ax.flatten()

for i,j in enumerate(np.random.choice(X.shape[0], size=6, replace=False)):
    coords = y_bbox[j] * 128
    ax[i].imshow(X[j].reshape(128,128,3))
    ax[i].set_title(idx_to_label[y_label[j]])
    ax[i].add_patch(plt.Rectangle((coords[0], coords[1]), 
                                  coords[2]-coords[0], coords[3]-coords[1], 
                                  edgecolor='red', facecolor='none'))

plt.tight_layout()
plt.show()

Classification¶

Split data¶

Create training and testing sets

In [ ]:
X_train, X_test, y_train, y_test_label = train_test_split(X, y_label, test_size=0.01, random_state=27)

Train¶

I'm choosing SGDClassifier because the data is large and I want to be able to perform stochastic gradient descent and also its ability to early stop. With this many parameters, a model can easily overfit so it's important to try and find the point of where it begins to overfit and stop for optimal results.

In [ ]:
%%time
model = SGDClassifier(loss='log', n_jobs=-1, random_state=27, learning_rate='adaptive', eta0=1e-10, 
                      early_stopping=True, validation_fraction=0.1, n_iter_no_change=3)
# 0.2 validation TODO
model.fit(X_train, y_train)
CPU times: user 1min 10s, sys: 40.6 s, total: 1min 51s
Wall time: 1min 40s
Out[ ]:
SGDClassifier(early_stopping=True, eta0=1e-10, learning_rate='adaptive',
              loss='log', n_iter_no_change=3, n_jobs=-1, random_state=27)
In [ ]:
model.n_iter_
Out[ ]:
4

Did it stop too early? Let's retrain with a few more iterations to see. Note that SGDClassifier has a parameter called validation_fraction which splits a validation set from the training data to determine when it stops.

In [ ]:
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.1, random_state=27)
In [ ]:
model2 = SGDClassifier(loss='log', n_jobs=-1, random_state=27, learning_rate='adaptive', eta0=1e-10)

epochs = 30

train_acc = np.zeros(epochs)
valid_acc = np.zeros(epochs)
for i in tqdm(range(epochs)):
    model2.partial_fit(X_train, y_train, np.unique(y_train))
    
    #log
    train_acc[i] += np.round(accuracy_score(y_train, model2.predict(X_train)),3)
    valid_acc[i] += np.round(accuracy_score(y_valid, model2.predict(X_valid)),3)
HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

In [ ]:
plt.plot(train_acc, label='train')
plt.plot(valid_acc, label='valid')
plt.title('CaDoD Training')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
In [ ]:
del model2

Evaluation¶

In [ ]:
expLog = pd.DataFrame(columns=["exp_name", 
                               "Train Acc", 
                               "Valid Acc",
                               "Test  Acc",
                               "Train MSE", 
                               "Valid MSE",
                               "Test  MSE",
                              ])
In [ ]:
exp_name = f"Baseline: Linear Model"
expLog.loc[0,:4] = [f"{exp_name}"] + list(np.round(
               [accuracy_score(y_train, model.predict(X_train)), 
                accuracy_score(y_valid, model.predict(X_valid)),
                accuracy_score(y_test_label, model.predict(X_test))],3))
/usr/local/lib/python3.6/site-packages/ipykernel_launcher.py:5: FutureWarning: Slicing a positional slice with .loc is not supported, and will raise TypeError in a future version.  Use .loc with labels or .iloc with positions instead.
  """
In [ ]:
expLog
Out[ ]:
exp_name Train Acc Valid Acc Test Acc Train MSE Valid MSE Test MSE
0 Baseline: SGDClassifier 0.584 0.574 0.554 NaN NaN NaN
In [ ]:
y_pred_label = model.predict(X_test)
y_pred_label_proba = model.predict_proba(X_test)

fig, ax = plt.subplots(nrows=2, ncols=5, sharex=False, sharey=False,figsize=(15,6))
ax = ax.flatten()

for i in range(10):
    img = X_test[i].reshape(128,128,3)
    ax[i].imshow(img)
    ax[i].set_title("Ground Truth: {0} \n Prediction: {1} | {2:.2f}".format(idx_to_label[y_test_label[i]],
                                                                   idx_to_label[y_pred_label[i]],
                                                                   y_pred_label_proba[i][y_pred_label[i]]),
                   color=("green" if y_pred_label[i]==y_test_label[i] else "red"))

plt.tight_layout()
plt.show()

Regression with multiple targets $[y_1, y_2, y_3, y_4]$¶

Train a linear regression model on multiple target values $[y_1, y_2, y_3, y_4]$ corresponding to [x, y, w, h] of the bounding box containing the object of interest. For more details see SKLearn's manpage on LinearRegression

image.png### Split data

In [ ]:
X_train, X_test, y_train, y_test = train_test_split(X, y_bbox, test_size=0.01, random_state=27)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.1, random_state=27)

Train¶

In [ ]:
%%time

from sklearn.linear_model import LinearRegression
# TODO closed loop solution, could use Lasso Ridge
model = ..... #fill in 
model.fit(X_train, y_train)

# might take a few minutes to train
#CPU times: user 1h 26min 40s, sys: 5min 53s, total: 1h 32min 34s
#Wall time: 17min 24s
CPU times: user 1h 26min 40s, sys: 5min 53s, total: 1h 32min 34s
Wall time: 17min 24s
Out[ ]:
LinearRegression(n_jobs=-1)

Evaluation¶

In [ ]:
expLog.iloc[0,4:] = list(np.round([mean_squared_error(y_train, model.predict(X_train)), 
          mean_squared_error(y_valid, model.predict(X_valid)), 
          mean_squared_error(y_test, model.predict(X_test))],3))

expLog
Out[ ]:
exp_name Train Acc Valid Acc Test Acc Train MSE Valid MSE Test MSE
0 Baseline: Linear Model 0.584 0.574 0.554 0 0.036 0.035
In [ ]:
y_pred_bbox = model.predict(X_test)

fig, ax = plt.subplots(nrows=2, ncols=3, sharex=False, sharey=False,figsize=(15,10))
ax = ax.flatten()

for i,j in enumerate(np.random.choice(X_test.shape[0], size=6, replace=False)):
    img = X_test[j].reshape(128,128,3)
    coords = y_pred_bbox[j] * 128
    ax[i].imshow(img)
    ax[i].set_title("Ground Truth: {0} \n Prediction: {1} | {2:.2f}".format(idx_to_label[y_test_label[j]],
                                                                   idx_to_label[y_pred_label[j]],
                                                                   y_pred_label_proba[j][y_pred_label[j]]),
                   color=("green" if y_pred_label[j]==y_test_label[j] else "red"))
    ax[i].add_patch(plt.Rectangle((coords[0], coords[1]), 
                                  coords[2]-coords[0], coords[3]-coords[1], 
                                  edgecolor='red', facecolor='none'))

plt.tight_layout()
plt.show()

Homegrown implementation¶

Implement a Homegrown Logistic Regression model. Extend the loss function from CXE to CXE + MSE, i.e., make it a complex multitask loss function where the resulting model predicts the class and bounding box coordinates at the same time.

Why use logistic regression?
Logistic regression is easily interpretable, and it is effective for binary cassification. We can modify the loss function to combine Cross-Entropy (CXE) with Mean Squared Error (MSE) for a multitask learning approach. Vanilla logistic regression models are designed for classification. By extending the loss function to include MSE, we're adapting the model to perform classification and regression (regression being the bounding box coordinates prediction).

How to measure success?
Accuracy: For the classification part (cats vs dogs), accuracy measures the percentage of correctly classified instances.
Precision and Recall: Useful in scenarios with imbalanced datasets or when one class's false positives/negatives are more critical.
Mean Squared Error (MSE): MSE will measure the average squared difference between the estimated values and the actual value, providing insight into the precision of the bounding box predictions.
Intersection over Union (IoU): An additional metric for bounding box accuracy, IoU measures the overlap between the predicted bounding box and the actual bounding box, offering a direct indication of prediction accuracy in spatial terms.

Description of pipeline steps:
Data Collection & Preprocessing: Gather a dataset containing images of cats and dogs, annotated with class labels and bounding box coordinates. Preprocess this data for consistency, normalization, and augmentation to improve model robustness.
Feature Extraction: Transform raw image data into a suitable format for the logistic regression model, possibly using techniques like PCA for dimensionality reduction if dealing with raw pixel values.
Model Implementation: Develop the logistic regression model from scratch, extending the loss function to include both Cross-Entropy (for classification) and Mean Squared Error (for bounding box regression).
Training: Train the model on your prepared dataset, using the combined CXE + MSE loss to simultaneously learn classification and bounding box prediction.
Evaluation: Use your success metrics (Accuracy, Precision, Recall, MSE, IoU) to evaluate the model's performance on a separate test set.
Fine-tuning and Optimization: Adjust model parameters, learning rate, or preprocessing steps based on performance to improve outcomes.

Results / Discussion¶

Conclusion¶

In [ ]: